Measuring Performance when Positives are Rare : Relative Advantage versus Predictive

نویسندگان

  • Stephen H. Muggleton
  • Christopher H. Bryant
  • Ashwin Srinivasan
چکیده

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predic-tors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CPro-gol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Performance is measured using both predictive accuracy and a new cost function, Relative Advantage (RA). The RA results show that searching for NPPs by using our best NPP predictor as a lter is more than 100 times more ee-cient than randomly selecting proteins for synthesis and testing them for biological activity. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring Performance when Positives Are Rare: Relative Advantage versus Predictive Accuracy - A Biological Case Study

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky like grammar representations are useful for learning accurate comprehensible predic tors of members of biological sequence families The positive only learn ing framework of the Inductive Logic Programming ILP system CPro gol is used to generate a grammar for recognising a class of ...

متن کامل

Measuring Performance when Positives are Rare

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomskylike grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of p...

متن کامل

Learning Chomsky-like Grammars for Biological Sequence Families

This paper presents a new method of measur ing performance when positives are rare and investigates whether Chomsky like grammar representations are useful for learning accu rate comprehensible predictors of members of biological sequence families The positive only learning framework of the Inductive Logic Programming ILP system CProgol is used to generate a grammar for recognis ing a class of ...

متن کامل

Model Predictive Inferential Control of a Distillation Column

Typical production objectives in distillation process require the delivery of products whose compositions meet certain specifications. The distillation control system, therefore, must hold product compositions as near the set points as possible in faces of upset. In this project, inferential model predictive control, that utilizes an artificial neural network estimator and model predictive cont...

متن کامل

Presenting a Hybrid Approach based on Two-stage Data Envelopment Analysis to Evaluating Organization Productivity

   Measuring the performance of a production system has been an important task in management for purposes of control, planning, etc. Lord Kelvin said :“When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.” Hence, manag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007